-
Notifications
You must be signed in to change notification settings - Fork 29k
add Support Java Class with circular references #37738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
428d510 to
8eb5b06
Compare
|
Can one of the admins verify this patch? |
|
Hm, skipping them doesn't seem right either. Not sure if this should be an option; it is just something that doesn't make sense to encode |
If it's a field the developer/application is comfortable with having self/circular reference, from Spark perspective I think it should allow the developer to stop the loop gracefully (which is to skip further processing the field in loop at developers' own judgement). This PR is not to force for either way (stop the whole application immediately as existing or skip the filed if the developer choose to), but leave for the developer a choice to choose. Ultimately, it's the developer building their own application have best knowledge how to handle it. I guess Spark probably assumed the circular reference must be a mistake made by the developers/application earlier. But it can really be a valid case even it could be rare. |
|
Can you describe a valid use case? I can't think of one. Encoders are used with data classes, bean-like classes |
Google Protobuf is an example, it's widely used as a data class. In the protobuf class, there is an attribute called Current spark implementation doesn't work with protobuf. There are some other examples on the issue: |
|
Hey All , |
|
Hey all, |
|
Hi, I am having this |
|
Hi @srowen , would you consider supporting |
|
Still seems weird to me -- I get it just seems a bit too hacky as the 'right' solution. Sometimes hacks are worth it |
What changes were proposed in this pull request?
If the target Java data class has a circular reference, Spark will fail fast from creating the Dataset or running Encoders.
This PR will add an option for developer to decide whether they would like to skip the circular field, or leave the application to fail.
Why are the changes needed?
If the target Java data class has a circular reference, Spark will fail fast from creating the Dataset or running Encoders.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Issue: https://issues.apache.org/jira/browse/SPARK-33598